Quickstart#
Welcome to AudibleLight!#
This tutorial walks through the data generation and synethesis process end-to-end.
We’ll do the following:
Create a basic
SceneAdd a tetrahedral microphone to the
SceneAdd some basic static sound
EventsAdd some background
AmbienceAdd some more advanced
Events, including moving events and events with augmentationsRender the whole scene to a first-order ambisonics audio file and metadata JSON file
For more information on any of these steps, you can check out the API documentation or the other tutorial files.
Import dependencies#
We need a few basic Python dependencies for this notebook. Note that audiblelight.utils contains basic utility functions that will come in handy when working with this package.
[1]:
import os
from pathlib import Path
from scipy import stats
from audiblelight import utils
Import Scene from audiblelight.core#
In this notebook, we’ll mostly be working with the Scene object. We should import it now.
The Scene is the highest level object within the AudibleLight API. It manages the soundscape and any listeners or events added to it, and is used to synthesise the entire audio file and metadata.
[2]:
from audiblelight.core import Scene
A note on backends#
Scene supports multiple backend types (which inherit from audiblelight.state.WorldState):
Ray-traced RIRs, using
rlr-audio-propagation(backend="rlr")Measured RIRs, reading from
.sofafiles in a manner similar tospatialscaper(backend="sofa")Parametric (shoebox) RIRs, defined in a similar manner to
pyroomacoustics
The underlying API is the same regardless of backend, however, making it easy to create complex datasets that work with different types of room impulse responses.
Set default values#
All of these values can (and should!) be changed in order to experiment with the functionality of AudibleLight.
[3]:
# OUTPUT DIRECTORY
OUTFOLDER = utils.get_project_root() / 'spatial_scenes'
if not os.path.isdir(OUTFOLDER):
os.makedirs(OUTFOLDER)
[4]:
# PATHS
FG_FOLDER = utils.get_project_root() / "tests/test_resources/soundevents"
MESH_PATH = utils.get_project_root() / "tests/test_resources/meshes/Oyens.glb"
NOISE_TYPE = "white"
[5]:
# SCENE SETTINGS
DURATION = 30.0 # seconds
MIC_ARRAY_NAME = 'ambeovr' # could also be "eigenmike32"...
MAX_OVERLAP = 3 # maximum number of temporally overlapping sound-events
MICROPHONE_POSITION = [2.5, -1.0, 1.0] # inside the living room
[6]:
# SCENE-WIDE DISTRIBUTIONS
MIN_VELOCITY, MAX_VELOCITY = 0.5, 1.5 # meters per second
MIN_SNR, MAX_SNR = 2, 8
MIN_RESOLUTION, MAX_RESOLUTION = 0.25, 2.0 # Hz/IRs per second
REF_DB = -50 # noise floor
[7]:
# These can be changed at will
N_STATIC_EVENTS = 4
N_MOVING_EVENTS = 1
Make a Scene!#
Now, we’re ready to create a Scene object with the parameters below.
By default, our Scene has the following properties:
A duration of 30 seconds
No more than 3 overlapping sound events at any one time
A noise floor level of -50 dB
Moving events at between 0.5 and 1.5 meters per second
Moving events with between 0.25 and 2.0 IRs per second
Events with maximum peaks at between 2-8 dB vs the noise floor
For this example, we’ll use backend="rlr".
[8]:
# This function simply returns a fresh `Scene` object with the parameters set in the cells above
def create_scene() -> Scene:
return Scene(
duration=DURATION,
sample_rate=44100,
backend="rlr",
backend_kwargs=dict(
mesh=utils.get_project_root() / "tests/test_resources/meshes/Oyens.glb"
),
scene_start_dist=stats.uniform(0.0, DURATION - 1),
event_start_dist=None,
event_duration_dist=None,
event_velocity_dist=stats.uniform(MIN_VELOCITY, MAX_VELOCITY),
event_resolution_dist=stats.uniform(MIN_RESOLUTION, MAX_RESOLUTION),
snr_dist=stats.uniform(MIN_SNR, MAX_SNR),
fg_path=Path(FG_FOLDER),
max_overlap=MAX_OVERLAP,
ref_db=REF_DB
)
[9]:
# Create a fresh scene object
scene = create_scene()
CreateContext: Context created
Now, we can visualise the Scene. The resulting object is interactive: try giving it a spin!
[10]:
out = scene.state.create_scene()
out.show()
[10]:
Add a listener#
Now, we’ll add a microphone to our mesh.
In AudibleLight, microphones are represented as subclasses of the audiblelight.micarrays.MicArray dataclass. A variety of standard microphone array geometries are included by default, or you can subclass this dataclass and create your own.
For now, we can use scene.add_microphone to create a tetrahedral microphone inside the living room of our mesh.
The output of this microphone will be in Ambisonics A-Format (sometimes referred to as “MIC”). To work with B-Format directly, we can use the FOAListener object, which will output first-order Ambisonics audio.
[11]:
# Add the microphone type we want, at the desired position
scene.add_microphone(microphone_type=MIC_ARRAY_NAME, alias=MIC_ARRAY_NAME, position=MICROPHONE_POSITION)
CreateContext: Context created
Warning: initializing context twice. Will destroy old context and create a new one.
[12]:
# Print some information about the microphone
scene.get_microphone(alias=MIC_ARRAY_NAME)
[12]:
{
"name": "ambeovr",
"micarray_type": "AmbeoVR",
"is_spherical": true,
"channel_layout_type": "mic",
"n_capsules": 4,
"capsule_names": [
"FLU",
"FRD",
"BLD",
"BRU"
],
"coordinates_absolute": [
[
2.5057922796533956,
-0.9942077203466043,
1.0057357643635105
],
[
2.5057922796533956,
-1.0057922796533958,
0.9942642356364896
],
[
2.4942077203466044,
-0.9942077203466043,
0.9942642356364896
],
[
2.4942077203466044,
-1.0057922796533958,
1.0057357643635105
]
],
"coordinates_center": [
2.5,
-1.0,
1.0
]
}
Add some sound sources#
Now, we’re ready to add some sound sources.
In AudibleLight, sound sources are represented by audiblelight.event.Event objects. Each Event is associated with one or more audiblelight.worldstate.Emitter objects, which dictate the position of the Event inside the mesh at a single point in time.
For a static sound source, an Event has one Emitter. For a moving sound source, an Event has multiple Emitters, depending on its velocity and resolution.
Note that Emitter objects should never be created directly. Instead, when we create an Event, we’ll automatically create the Emitter objects that it needs.
For now, we’ll just add in a small number of static Event objects with random positions and audio files.
[13]:
# Add the correct number of static sources
scene.clear_events()
for _ in range(N_STATIC_EVENTS):
scene.add_event(event_type="static")
Warning: initializing context twice. Will destroy old context and create a new one.
CreateContext: Context created
Warning: initializing context twice. Will destroy old context and create a new one.
CreateContext: Context created
2025-10-07 15:01:55.494 | INFO | audiblelight.core:add_event:830 - Event added successfully: Static 'Event' with alias 'event000', audio file '/home/huw-cheston/Documents/python_projects/AudibleLight/tests/test_resources/soundevents/doorCupboard/35632.wav' (unloaded, 0 augmentations), 1 emitter(s).
Warning: initializing context twice. Will destroy old context and create a new one.
CreateContext: Context created
2025-10-07 15:01:55.970 | INFO | audiblelight.core:add_event:830 - Event added successfully: Static 'Event' with alias 'event001', audio file '/home/huw-cheston/Documents/python_projects/AudibleLight/tests/test_resources/soundevents/waterTap/205695.wav' (unloaded, 0 augmentations), 1 emitter(s).
CreateContext: Context created
Warning: initializing context twice. Will destroy old context and create a new one.
2025-10-07 15:01:56.436 | INFO | audiblelight.core:add_event:830 - Event added successfully: Static 'Event' with alias 'event002', audio file '/home/huw-cheston/Documents/python_projects/AudibleLight/tests/test_resources/soundevents/maleSpeech/93899.wav' (unloaded, 0 augmentations), 1 emitter(s).
Warning: initializing context twice. Will destroy old context and create a new one.
CreateContext: Context created
2025-10-07 15:01:56.944 | INFO | audiblelight.core:add_event:830 - Event added successfully: Static 'Event' with alias 'event003', audio file '/home/huw-cheston/Documents/python_projects/AudibleLight/tests/test_resources/soundevents/laughter/9547.wav' (unloaded, 0 augmentations), 1 emitter(s).
Add background noise#
In AudibleLight, Ambience objects capture non-moving, non-spatialised sound, that is not associated with a particular spatial position. Adding this type of noise can be useful to train robust acoustic imaging systems.
To create Ambience, we have two choices:
Pass in an audio filepath, which will be tiled to match the duration and channel count of the
ScenePass in the name of a particular noise type (e.g.,
white,pink)
For now, we’ll just add in white noise.
[14]:
scene.add_ambience(noise=NOISE_TYPE)
Add more advanced Event types#
AudibleLight has support for many different types of sound events, including sound events that move across a variety of trajectories, sound events placed in particular positions, and sound events with data augmentations (time-frequency domain masking, etc.). For more information, see the tutorial on adding Event objects to a Scene.
For now, we’ll just show how we can create a sound event that makes a random walk starting from a position given in polar coordinates with respect to our microphone, with distortion applied to the audio file.
[15]:
from audiblelight.augmentation import Distortion
moving_event = scene.add_event(
event_type="moving",
alias="telephone",
filepath=FG_FOLDER / "telephone/30085.wav",
polar=True,
position=[0.0, 90.0, 1.0],
shape="linear",
scene_start=5.0, # start five seconds in to the scene
spatial_resolution=1.5,
spatial_velocity=1.0,
duration=2,
augmentations=Distortion
)
Warning: initializing context twice. Will destroy old context and create a new one.
2025-10-07 15:02:03.334 | INFO | audiblelight.core:add_event:830 - Event added successfully: Moving 'Event' with alias 'telephone', audio file '/home/huw-cheston/Documents/python_projects/AudibleLight/tests/test_resources/soundevents/telephone/30085.wav' (unloaded, 1 augmentations), 4 emitter(s).
CreateContext: Context created
We can also take a listen to our audio file (note that this has not been spatialised yet, so only the distortion will be audible)
[16]:
from IPython.display import Audio
Audio(moving_event.load_audio(), rate=scene.sample_rate)
[16]:
Synthesise the audio and metadata#
As a recap, we have done the following:
Created a
Sceneobject from a meshAdded multiple static
Eventobjects at random positionsAdded background white noise
AmbienceAdded a single moving
Eventwith distortion applied, that makes a random walk from a given position
We can now generate the spatial audio and metadata by calling Scene.generate and providing output paths to save the wav and json files.
[18]:
# Do the generation!
scene.generate(
audio_fname=str(OUTFOLDER / "audio_out_random.wav"),
metadata_fname=str(OUTFOLDER / "metadata_out_random.json"),
)
Warning: initializing context twice. Will destroy old context and create a new one.
2025-10-07 15:02:20.693 | INFO | audiblelight.worldstate:simulate:1685 - Starting simulation with 8 emitters, 1 microphones
2025-10-07 15:02:49.395 | INFO | audiblelight.worldstate:simulate:1693 - Finished simulation! Overall indirect ray efficiency: 0.997
CreateContext: Context created
2025-10-07 15:02:54.774 | INFO | audiblelight.synthesize:render_audio_for_all_scene_events:571 - Rendered scene audio in 4.70 seconds!
The audio file and metadata should now be accessible inside our output folder.
[19]:
# Pretty print the metadata JSON
print(repr(scene))
{
"audiblelight_version": "0.1.0",
"rlr_audio_propagation_version": "0.0.1",
"creation_time": "2025-08-20_12:07:50",
"duration": 30.0,
"ref_db": -50,
"max_overlap": 3,
"fg_path": "/home/huw-cheston/Documents/python_projects/AudibleLight/tests/test_resources/soundevents",
"ambience": {
"ambience000": {
"alias": "ambience000",
"beta": 0,
"filepath": null,
"channels": 4,
"sample_rate": 44100,
"duration": 30.0,
"ref_db": -50,
"noise_kwargs": {}
}
},
"events": {
"event000": {
"alias": "event000",
"filename": "236657.wav",
"filepath": "/home/huw-cheston/Documents/python_projects/AudibleLight/tests/test_resources/soundevents/femaleSpeech/236657.wav",
"class_id": null,
"class_label": null,
"is_moving": false,
"scene_start": 24.235181686117272,
"scene_end": 24.653979872058315,
"event_start": 0.0,
"event_end": 0.41879818594104307,
"duration": 0.41879818594104307,
"snr": 6.748027382626969,
"sample_rate": 44100.0,
"spatial_resolution": null,
"spatial_velocity": null,
"num_emitters": 1,
"emitters": [
[
4.35991043396444,
-3.044263530204878,
0.5891032136364933
]
],
"emitters_relative": {
"ambeovr": [
[
312.29653071540775,
-8.456447367319983,
2.7941217533134375
]
]
},
"augmentations": []
},
"event001": {
"alias": "event001",
"filename": "205695.wav",
"filepath": "/home/huw-cheston/Documents/python_projects/AudibleLight/tests/test_resources/soundevents/waterTap/205695.wav",
"class_id": null,
"class_label": null,
"is_moving": false,
"scene_start": 3.4514953647255586,
"scene_end": 9.511903527990864,
"event_start": 0.0,
"event_end": 6.060408163265306,
"duration": 6.060408163265306,
"snr": 2.723226107271114,
"sample_rate": 44100.0,
"spatial_resolution": null,
"spatial_velocity": null,
"num_emitters": 1,
"emitters": [
[
3.096275782297078,
-4.153302480826513,
1.2712634812270731
]
],
"emitters_relative": {
"ambeovr": [
[
280.7079487322296,
4.831569403526189,
3.2206280785567376
]
]
},
"augmentations": []
},
"event002": {
"alias": "event002",
"filename": "236385.wav",
"filepath": "/home/huw-cheston/Documents/python_projects/AudibleLight/tests/test_resources/soundevents/femaleSpeech/236385.wav",
"class_id": null,
"class_label": null,
"is_moving": false,
"scene_start": 8.323269416047072,
"scene_end": 8.715559665480178,
"event_start": 0.0,
"event_end": 0.3922902494331066,
"duration": 0.3922902494331066,
"snr": 9.022142809631786,
"sample_rate": 44100.0,
"spatial_resolution": null,
"spatial_velocity": null,
"num_emitters": 1,
"emitters": [
[
-0.3581484310736043,
-6.376464854662041,
0.4073888355057993
]
],
"emitters_relative": {
"ambeovr": [
[
242.00472446892127,
-5.558837275985748,
6.117726275320578
]
]
},
"augmentations": []
},
"event003": {
"alias": "event003",
"filename": "242663.wav",
"filepath": "/home/huw-cheston/Documents/python_projects/AudibleLight/tests/test_resources/soundevents/femaleSpeech/242663.wav",
"class_id": null,
"class_label": null,
"is_moving": false,
"scene_start": 15.818482566614847,
"scene_end": 16.271271682261105,
"event_start": 0.0,
"event_end": 0.4527891156462585,
"duration": 0.4527891156462585,
"snr": 6.620465733484581,
"sample_rate": 44100.0,
"spatial_resolution": null,
"spatial_velocity": null,
"num_emitters": 1,
"emitters": [
[
5.050466347179171,
-0.5452550625065076,
0.22202123889172043
]
],
"emitters_relative": {
"ambeovr": [
[
10.109529882311296,
-16.71490561353896,
2.704981053354163
]
]
},
"augmentations": []
},
"telephone": {
"alias": "telephone",
"filename": "30085.wav",
"filepath": "/home/huw-cheston/Documents/python_projects/AudibleLight/tests/test_resources/soundevents/telephone/30085.wav",
"class_id": null,
"class_label": null,
"is_moving": true,
"scene_start": 5.0,
"scene_end": 7.0,
"event_start": 0.0,
"event_end": 2.0,
"duration": 2.0,
"snr": 4.650165135352253,
"sample_rate": 44100.0,
"spatial_resolution": 1.5,
"spatial_velocity": 1.0,
"num_emitters": 4,
"emitters": [
[
2.5,
-1.0,
2.0
],
[
2.065109573640491,
-0.9013119306521276,
1.74037986035627
],
[
1.6302191472809824,
-0.8026238613042551,
1.4807597207125398
],
[
1.1953287209214736,
-0.7039357919563827,
1.2211395810688095
]
],
"emitters_relative": {
"ambeovr": [
[
0.0,
90.0,
1.0
],
[
167.2146100280266,
58.93850546023989,
0.8643097567376731
],
[
167.2146100280266,
28.32608625020126,
1.0132156635892788
],
[
167.2146100280266,
9.385879919956016,
1.3559955295103967
]
]
},
"augmentations": [
{
"name": "Distortion",
"sample_rate": 44100,
"drive_db": 22.644900847314595
}
]
}
},
"state": {
"emitters": {
"event000": [
[
4.35991043396444,
-3.044263530204878,
0.5891032136364933
]
],
"event001": [
[
3.096275782297078,
-4.153302480826513,
1.2712634812270731
]
],
"event002": [
[
-0.3581484310736043,
-6.376464854662041,
0.4073888355057993
]
],
"event003": [
[
5.050466347179171,
-0.5452550625065076,
0.22202123889172043
]
],
"telephone": [
[
2.5,
-1.0,
2.0
],
[
2.065109573640491,
-0.9013119306521276,
1.74037986035627
],
[
1.6302191472809824,
-0.8026238613042551,
1.4807597207125398
],
[
1.1953287209214736,
-0.7039357919563827,
1.2211395810688095
]
]
},
"microphones": {
"ambeovr": {
"name": "ambeovr",
"micarray_type": "AmbeoVR",
"is_spherical": true,
"n_capsules": 4,
"capsule_names": [
"FLU",
"FRD",
"BLD",
"BRU"
],
"coordinates_absolute": [
[
2.5057922796533956,
-0.9942077203466043,
1.0057357643635105
],
[
2.5057922796533956,
-1.0057922796533958,
0.9942642356364896
],
[
2.4942077203466044,
-0.9942077203466043,
0.9942642356364896
],
[
2.4942077203466044,
-1.0057922796533958,
1.0057357643635105
]
],
"coordinates_center": [
2.5,
-1.0,
1.0
]
}
},
"mesh": {
"fname": "Oyens",
"ftype": ".glb",
"fpath": "/home/huw-cheston/Documents/python_projects/AudibleLight/tests/test_resources/meshes/Oyens.glb",
"units": "meters",
"from_gltf_primitive": false,
"name": "defaultobject",
"node": "defaultobject",
"bounds": [
[
-3.0433080196380615,
-10.448445320129395,
-1.1850370168685913
],
[
5.973234176635742,
2.101027011871338,
2.4577369689941406
]
],
"centroid": [
1.527919030159762,
-4.550817438070386,
1.162934397641578
]
},
"rlr_config": {
"diffraction": 1,
"direct": 1,
"direct_ray_count": 500,
"direct_sh_order": 3,
"frequency_bands": 4,
"global_volume": 1.0,
"hrtf_back": [
0.0,
0.0,
1.0
],
"hrtf_right": [
1.0,
0.0,
0.0
],
"hrtf_up": [
0.0,
1.0,
0.0
],
"indirect": 1,
"indirect_ray_count": 5000,
"indirect_ray_depth": 200,
"indirect_sh_order": 1,
"max_diffraction_order": 10,
"max_ir_length": 4.0,
"mesh_simplification": 0,
"sample_rate": 44100.0,
"size": 146,
"source_ray_count": 200,
"source_ray_depth": 10,
"temporal_coherence": 0,
"thread_count": 1,
"transmission": 1,
"unit_scale": 1.0
},
"empty_space_around_mic": 0.1,
"empty_space_around_emitter": 0.2,
"empty_space_around_surface": 0.2,
"empty_space_around_capsule": 0.05,
"repair_threshold": null
}
}
Create DCASE-style metadata.#
The DCASE challenges use a special metadata format, more details about which can be found on the website.
AudibleLight can be used to generate this metadata from a Scene. In combination with the spatial audio we just generated above, that is enough to train a model like `SELDNet <sharathadavanne/seld-dcase2023>`__
[20]:
from audiblelight.synthesize import generate_dcase2024_metadata
dcase_out = generate_dcase2024_metadata(scene)
{'ambeovr': active_class_index source_number_index azimuth elevation \
frame_number
28 7 0 -157 3
29 7 0 -157 3
30 7 0 -157 3
31 7 0 -157 3
32 7 0 -157 3
... ... ... ... ...
129 10 0 -100 -24
130 10 0 -100 -24
131 10 0 -100 -24
132 10 0 -100 -24
133 10 0 -100 -24
distance
frame_number
28 337
29 337
30 337
31 337
32 337
... ...
129 196
130 196
131 196
132 196
133 196
[126 rows x 5 columns]}
By default, this function creates a dictionary of pandas.DataFrame objects, one for every microphone added to our scene. We can easily print just the first few frames for our AmbeoVR microphone:
[21]:
dcase_out["ambeovr"].head()
[21]:
| active_class_index | source_number_index | azimuth | elevation | distance | |
|---|---|---|---|---|---|
| frame_number | |||||
| 28 | 7 | 0 | -157 | 3 | 337 |
| 29 | 7 | 0 | -157 | 3 | 337 |
| 30 | 7 | 0 | -157 | 3 | 337 |
| 31 | 7 | 0 | -157 | 3 | 337 |
| 32 | 7 | 0 | -157 | 3 | 337 |
For more information on what any of these columns mean, refer to the DCASE community website.
Recreating a Scene from metadata#
Finally, note that we can also re-create a Scene object from scratch, just by reloading our JSON:
[20]:
reloaded_scene = Scene.from_json(str(OUTFOLDER / "metadata_out_random.json"))
assert reloaded_scene == scene
2025-08-20 12:07:50.575 | WARNING | audiblelight.core:from_dict:1115 - Currently, distributions cannot be loaded with `Scene.from_dict`. You will need to manually redefine these using, for instance, setattr(scene, 'event_start_dist', ...), repeating this for every distribution.
CreateContext: Context created
Material for category 'default' was not found. Using default material instead.
Material for category 'default' was not found. Using default material instead.
CreateContext: Context created
That’s the end of the quickstart guide for AudibleLight! For more information, check out the rest of the tutorials or take a look at the API documentation.